Augmenting Mathematical Formulae for More Effective Querying & Presentation
نویسنده
چکیده
1 Summary Scientists and engineers search regularly for well‐ established mathematical concepts, expressed by mathematical formulae. Conventional search en‐ gines focus on keyword based text search today. An analogue approach does not work for mathe‐ matical formulae. Knowledge about identifiers alone is not sufficient to derive the semantics of the formula they occur in. Currently, for formula related inquiries the solution is to consult domain experts, which is slow, expensive and non‐deter‐ ministic. Consequently, core concepts to enable formula related queries on potentially large datasets are needed. While earlier attempts addressed the problem as a whole, I identify three mutually or‐ thogonal challenges to formula search. The first challenge, content augmentation, is to collect the full semantic information about indi‐ vidual formula from a given input. Most funda‐ mentally, this might start with digitization of ana‐ logue mathematical content, captures the con‐ version from imperative typesetting instructions (i.e. TEX) to declarative layout descriptions (i.e. presentation MathML) but also deals about infer‐ ring the syntactical structure of a formula (i.e. the expression tree often represented in content MathML). In addition, this first challenge involves the association of formula metadata such as con‐ straints, identifier definitions, related keywords or substitutions with individual formulae. The second challenge is content querying. This ranges from query formulation, to query pro‐ cessing, actual search, hit ranking to result presentation. There are different forms of for‐ mula queries. Standard ad‐hoc retrieval queries, where a user defines the information need and the math information retrieval system returns a ranked list given a particular data set. Similar is the interactive formula filter queries, where a user filters a data set interactively until she de‐ rives at the result set, which is relevant to her needs. Different are unattended queries that run in the background to assist authors during editing or readers to identify related work while viewing a certain formula. The third challenge is content indexing for grow‐ ing data sets. This challenge includes the scalable execution of the solutions to the two aforemen‐ tioned challenges. While well‐established from the area of database systems i.e. XML processing and indexing can be applied, math specific com‐ plexity problems require individual solutions. Augmented content (challenge 1) opens up addi‐ tional options for similarity search, and poten‐ tially improves the search results regardless of the applied similarity measure. In order to sepa‐ rate the effect of content augmentation from in‐ trinsic improvements …
منابع مشابه
A Search Engine for Mathematical Formulae
We present a search engine for mathematical formulae. The MathWebSearch system harvests the web for content representations (currentlyMathML andOpenMath) of formulae and indexes them with substitution tree indexing, a technique originally developed for accessing intermediate results in automated theorem provers. For querying, we present a generic language extension approach that allows construc...
متن کاملA New High Order Closed Newton-Cotes Trigonometrically-fitted Formulae for the Numerical Solution of the Schrodinger Equation
In this paper, we investigate the connection between closed Newton-Cotes formulae, trigonometrically-fitted methods, symplectic integrators and efficient integration of the Schr¨odinger equation. The study of multistep symplectic integrators is very poor although in the last decades several one step symplectic integrators have been produced based on symplectic geometry (see the relevant lit...
متن کاملMathWebSearch 0.5: Scaling an Open Formula Search Engine
MathWebSearch is an open-source, open-format, contentoriented search engine for mathematical formulae. It is a complete system capable of crawling, indexing, and querying expressions based on their functional structure (operator tree) rather than their presentation. In version 0.5, we concentrate on scalability issues in MathWebSearch to take advantage of corpora in the giga-formula range. We r...
متن کاملCost-Effective Combination of Multiple Rankers: Learning When Not To Query
Combining multiple rankers has potential for improving the performance over using any of the single rankers. However, querying multiple rankers for every search request can often be too costly due to efficiency or commercial reasons. In this work, we propose a more cost-effective approach that predicts the utility of any additional rankers, prior to querying them. We develop a combined measure ...
متن کاملIndexing and Searching Mathematics in Digital Libraries
This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...
متن کامل